import numpy as np
import pandas as pd
import copy
import geopandas as gpd
import plotly.graph_objs as go
import plotly as py
from plotly.offline import init_notebook_mode, iplot
import matplotlib.pyplot as plt
from copy import deepcopy
import matplotlib as mpl
from scipy.stats import norm
from itertools import repeat
from numpy.random import normal
from scipy.stats import ttest_1samp
import warnings
warnings.filterwarnings('ignore')
# init_notebook_mode()
plt.rcParams["figure.figsize"] = (10,5)
The U.S. government incarcerates over 1,500,000 inmates in the prison system, which is the largest in the world and keeps growing. Amid concerns about the unsustainable growth of our prison system, begging in the 1980s, prison privatization became a booming industry under government programs to cut back on the federal workforce. The Justice Departments has been contracting private prison corporations for the incarceration of prisoners.
To provide clients of the private prison industry a status report of the current market, this project aims to survey the prisoner population dynamics of the U.S. justice and correction system, and summarize the longitudinal trends of incarceration. A forecast model is built by learning the pattern presented in past data and to predict change of the prisoner population in the future. This study can offer important information for business decision for clients and investors in the prison industry, such as regional expansion of revenue and contract for operation of custody facilities in the future.
Names of U.S. jurisdictions (states) and their abbreviation are organized into a DataFrame. The abbreviation will be used as geological labels for visualizing data in the form of a map.
states = pd.read_excel("./messy_data/states.xlsx")
# Delete District of Columbia and add a Abbraviation FED for Federal,
# because DC is underneath the Federal jurisdiction.
states.drop(states.index[states.Abbreviation == "DC"], inplace=True)
states.columns = ["Jurisdiction", "Jurisdiction Abbreviation"]
states = states.append({"Jurisdiction" : "Federal", "Jurisdiction Abbreviation" : "FED"}, ignore_index=True)
states.tail()
The U.S. Census Bureau divides the country into 4 regions -- Northeast, Midwest, South and West. Each region is futhre divided into several divisions. These nomenclatures are used to generate regional aggregations of the data.
regions = pd.read_excel("./messy_data/state_region.xlsx")
# Drop District of Columbia.
regions.drop(regions.index[regions.State == "District of Columbia"], inplace=True)
regions.head()
The U.S. Bureau of Justice Statistics surveys the inmates annually and publishes the Prisoners In XXXX series, reporting the demographics of the U.S. prisoner population. They congregate the data in both federal and states’ male and female prison system. The data table is then melted to an analysis-ready and clean format with rows representing observation while columns indicates variables.
malePrisonerPopulation = pd.read_excel("./messy_data/Prisoners under the jurisdiction of state or federal correctional authorities 1978-2016.xlsx", sheet_name="Male", header=9, nrows=54).dropna(1, "all").dropna(0)
femalePrisonerPopulation = pd.read_excel("./messy_data/Prisoners under the jurisdiction of state or federal correctional authorities 1978-2016.xlsx", sheet_name="Female", header=9, nrows=54).dropna(1, "all").dropna(0)
totalPrisonerPopulation = pd.read_excel("./messy_data/Prisoners under the jurisdiction of state or federal correctional authorities 1978-2016.xlsx", sheet_name="Total", header=9, nrows=54).dropna(1, "all").dropna(0)
def cleanPopulationTable(table, states):
# add DC data to Federal, and delete it
table.loc[table.Jurisdiction == "Federal"].iloc[:, 1:] = table.loc[table.Jurisdiction == "Federal"].iloc[:, 1:] + table.loc[table.Jurisdiction == "District of Columbia"].iloc[:, 1:].replace("--", 0)
table.drop(table.index[table.Jurisdiction == "District of Columbia"], inplace=True)
meltedTable = table.melt(id_vars = "Jurisdiction", var_name = "Year", value_name = "Population")
table.set_index("Jurisdiction", inplace=True)
table.columns = table.columns.astype(np.int)
table.columns.name = "Year"
table = table.astype(np.float)
return table, meltedTable
Originally, District of Columbia was treated as an independent jurisdiction in Bureau of Justice Statistics's survey. However, as of 2001, it is being considered as part of the federal jurisdiction. This created some missing values for the District of Columbia data and the federal data starting in 2001. To correct for the inconsistency, the District of Comubia data prior to 2001 is added into the federal prisoner population during data cleaning.
malePrisonerPopulation, malePrisonerPopulation_melted = cleanPopulationTable(malePrisonerPopulation, states)
femalePrisonerPopulation, femalePrisonerPopulation_melted = cleanPopulationTable(femalePrisonerPopulation, states)
totalPrisonerPopulation, totalPrisonerPopulation_melted = cleanPopulationTable(totalPrisonerPopulation, states)
totalPrisonerPopulation.head()
totalPrisonerPopulation_melted.head()
The data are then visualized in a U.S. map, with pseudo-color indicating the prison population in each geological jurisdiction.
def plotAnnualStatePopulation(table, gender, years_chosen):
table = states.merge(table, on="Jurisdiction")
table = table[table.Jurisdiction != "Federal"]
cmax = table.Population.max()
cmin = table.Population.min()
years = table.Year.unique()
heatMapData = [{"type" : "choropleth", "locations" : table.loc[table.Year == year, "Jurisdiction Abbreviation"], "locationmode" : "USA-states", "colorscale" : "Viridis", "zmin" : cmin, "zmax" : cmax, "z" : table.Population[table.Year == year].astype(float)} for year in years]
mapLayout = [{"geo" : {"scope" : 'usa'}, "title" : gender + " Prisoner Population by State, " + str(year)} for year in years]
for i, data in enumerate(heatMapData):
if years[i] in years_chosen:
usHeatMap = go.Figure(data=[data], layout=mapLayout[i])
# iplot(usHeatMap)
usHeatMap.show(renderer="svg")
plotAnnualStatePopulation(totalPrisonerPopulation_melted, "Total", [1978, 2016])
plotAnnualStatePopulation(malePrisonerPopulation_melted, "Male", [1978, 2016])
plotAnnualStatePopulation(femalePrisonerPopulation_melted, "Female", [1978, 2016])
Compared to the data in 1978, the prisoner population in most jurisdictions has increased in both the male and female systems considerably as far as the 2016 data show. The elevation is especially prominent in California, Texas and Florida.
The U.S. Bureau of Justice Statistics also surveys the statistics about annual admission and release in each jurisdiction. These data are very important in terms of building a model that accounts for the temporal dynamics of the prisoner population.
maleAdmission = pd.read_excel("./messy_data/admissions.xlsx", sheet_name="Male").dropna().replace("/", np.nan)
maleRelease = pd.read_excel("./messy_data/releases.xlsx", sheet_name="Male").dropna().replace("/", np.nan)
femaleAdmission = pd.read_excel("./messy_data/admissions.xlsx", sheet_name="Female").dropna().replace("/", np.nan)
femaleRelease = pd.read_excel("./messy_data/releases.xlsx", sheet_name="Female").dropna().replace("/", np.nan)
totalAdmission = pd.read_excel("./messy_data/admissions.xlsx", sheet_name="Total").dropna().replace("/", np.nan)
totalRelease = pd.read_excel("./messy_data/releases.xlsx", sheet_name="Total").dropna().replace("/", np.nan)
def cleanCountTable(table, states, tableType):
table.replace([".", "/"], [np.nan, np.nan], inplace=True)
for i, row in table.iterrows():
if row.isnull().any():
table.loc[i, 1 :] = pd.to_numeric(row.iloc[1 :]).interpolate()
# add DC data to Federal, and delete it
table.loc[table.Jurisdiction == "Federal"].iloc[:, 1:] = table.loc[table.Jurisdiction == "Federal"].iloc[:, 1:] + table.loc[table.Jurisdiction == "District of Columbia"].iloc[:, 1:].replace("--", 0)
table.drop(table.index[table.Jurisdiction == "District of Columbia"], inplace=True)
meltedTable = table.melt(id_vars = "Jurisdiction", var_name = "Year", value_name = tableType)
table.set_index("Jurisdiction", inplace=True)
table.columns = table.columns.astype(np.int)
table.columns.name = "Year"
table = table.astype(np.float)
return table, meltedTable
The inconsistency that involves the District of Columbia and federal numbers, as pointed out for the population data above, also exists for the annual admission and release data. It is corrected in the same fashion as mentioned before.
Some jurisdiction also failed to report their data during certain years. The corresponding missing values are interpolated linearly.
maleAdmission, maleAdmission_melted = cleanCountTable(maleAdmission, states, "Admission")
maleRelease, maleRelease_melted = cleanCountTable(maleRelease, states, "Release")
femaleAdmission, femaleAdmission_melted = cleanCountTable(femaleAdmission, states, "Admission")
femaleRelease, femaleRelease_melted = cleanCountTable(femaleRelease, states, "Release")
totalAdmission, totalAdmission_melted = cleanCountTable(totalAdmission, states, "Admission")
totalRelease, totalRelease_melted = cleanCountTable(totalRelease, states, "Release")
totalAdmission.head()
totalAdmission_melted.head()
The annual population, admission and release data are aggregated to reflect the sum in regions and divisions according to the U.S. Census Bureau's framework. The aggregation makes it possible to model the data on a regional level.
def aggregateRegionalPopulationSum(meltedTable, regions):
meltedTable = regions.merge(meltedTable, left_on="State", right_on="Jurisdiction", how="right")
temp_index = meltedTable["Jurisdiction"] == "Federal"
meltedTable.loc[temp_index, "Region"] = "Federal"
meltedTable.loc[temp_index, "Division"] = "Federal"
regionSum = meltedTable.groupby(["Year", "Region"]).Population.sum().unstack(0)
divisionSum = meltedTable.groupby(["Year", "Region", "Division"]).Population.sum().unstack(0)
return regionSum, divisionSum
malePrisonerPopulationRegionSum, malePrisonerPopulationDivisionSum = aggregateRegionalPopulationSum(malePrisonerPopulation_melted, regions)
femalePrisonerPopulationRegionSum, femalePrisonerPopulationDivisionSum = aggregateRegionalPopulationSum(femalePrisonerPopulation_melted, regions)
totalPrisonerPopulationRegionSum, totalPrisonerPopulationDivisionSum = aggregateRegionalPopulationSum(totalPrisonerPopulation_melted, regions)
totalPrisonerPopulationRegionSum
totalPrisonerPopulationDivisionSum
def aggregateRegionalCountSum(meltedTable, regions, tableType):
meltedTable = regions.merge(meltedTable, left_on="State", right_on="Jurisdiction", how="right")
temp_index = meltedTable["Jurisdiction"] == "Federal"
meltedTable.loc[temp_index, "Region"] = "Federal"
meltedTable.loc[temp_index, "Division"] = "Federal"
regionSum = meltedTable.groupby(["Year", "Region"])[tableType].sum().unstack(0)
divisionSum = meltedTable.groupby(["Year", "Region", "Division"])[tableType].sum().unstack(0)
return regionSum, divisionSum
maleAdmissionRegionSum, maleAdmissionDivisionSum = aggregateRegionalCountSum(maleAdmission_melted, regions, "Admission")
maleReleaseRegionSum, maleReleaseDivisionSum = aggregateRegionalCountSum(maleRelease_melted, regions, "Release")
femaleAdmissionRegionSum, femaleAdmissionDivisionSum = aggregateRegionalCountSum(femaleAdmission_melted, regions, "Admission")
femaleReleaseRegionSum, femaleReleaseDivisionSum = aggregateRegionalCountSum(femaleRelease_melted, regions, "Release")
totalAdmissionRegionSum, totalAdmissionDivisionSum = aggregateRegionalCountSum(totalAdmission_melted, regions, "Admission")
totalReleaseRegionSum, totalReleaseDivisionSum = aggregateRegionalCountSum(femaleRelease_melted, regions, "Release")
The Bureau of Justice Statistics also provides reports about the correction facility occupancy for the 2010s. Occupancy rate is defined as the ratio of prisoner population to the capacity of the facilities.
occupancyLowRate_11_16 = pd.read_excel("./messy_data/Prison occupation rate.xlsx", "Highest Capacity", na_values="/")
occupancyHighRate_11_16 = pd.read_excel("./messy_data/Prison occupation rate.xlsx", "Lowest Capacity", na_values="/")
def cleanAndMeltCustodyOccupancyTable(table, states):
table = table.melt(id_vars = "Jurisdiction", var_name = "Year", value_name = "Occupancy")
table.Occupancy = table.Occupancy / 100
table.Year = table.Year.astype(int)
return table
meltedOccupancyLowRate_11_16 = cleanAndMeltCustodyOccupancyTable(occupancyLowRate_11_16, states)
meltedOccupancyHighRate_11_16 = cleanAndMeltCustodyOccupancyTable(occupancyHighRate_11_16, states)
def plotAnnualStateOccupancy(table, low_or_high, years_chosen):
table = states.merge(table, on="Jurisdiction")
table = table[table.Occupancy != "Federal"]
cmax = table.Occupancy.max()
cmin = table.Occupancy.min()
years = table.Year.unique()
heatMapData = [{"type" : "choropleth", "locations" : table.loc[table.Year == year, "Jurisdiction Abbreviation"], "locationmode" : "USA-states", "colorscale" : "Viridis", "zmin" : cmin, "zmax" : cmax, "z" : table.Occupancy[table.Year == year].astype(float)} for year in years]
mapLayout = [{"geo" : {"scope" : 'usa'}, "title" : "Prison Occupancy (" + low_or_high + " Estimation) by State, " + str(year)} for year in years]
for i, data in enumerate(heatMapData):
if years[i] in years_chosen:
usHeatMap = go.Figure(data=[data], layout=mapLayout[i])
# iplot(usHeatMap)
usHeatMap.show(renderer="svg")
plotAnnualStateOccupancy(meltedOccupancyLowRate_11_16, "Low", [2011, 2016])
plotAnnualStateOccupancy(meltedOccupancyHighRate_11_16, "High", [2011, 2016])
The data suggest during the second decade of the 21st century, the occupancy rate of U.S. prisons are reducing. This is especially true for some states such as California. However, for many states, their prisons are still operating over 100% of its capacity.
As the forecast model is able to predict future prisoner population. It can be divided by the current capacity to make projection about the future prison ocuupancy rate, if no new facility is being built in the future.
capacity = pd.read_excel("./messy_data/facility capacity 2016.xlsx").replace(["…", "/"], np.nan)
capacity = capacity.set_index("Jurisdiction").max(axis=1)
capacity.dropna(inplace=True)
capacity = capacity.astype(np.float)
capacity.name = "Capacity"
capacity.head()
The data are also aggregated and summed to reflect prison capacity on a regional level.
capacitySum = regions.merge(capacity, left_on="State",right_on="Jurisdiction")
capacitySum.columns = ["Jurisdiction", "Division", "Region", "Capacity"]
capacitySum.iloc[-1] = ["Federal", "Federal", "Federal", capacity["Federal"]]
capacitySum = capacitySum[["Region", "Division", "Jurisdiction", "Capacity"]]
capacityRegionSum = capacitySum.groupby(["Region"]).Capacity.sum()
capacityDivisionSum = capacitySum.groupby(["Region", "Division"]).Capacity.sum()
capacityRegionSum
capacityDivisionSum
The U.S. Sentencing Commission is an independent agency of the judicial branch of the federal government. It composes the sentencing guidelines for the U.S. federal courts. It also publishes annual statistics pertaining to the criminal case sentencing. They survey the average national sentencing every year.
annualSentence = pd.read_excel("./messy_data/national sentencing.xlsx").iloc[:, [0,1]]
annualSentence.columns = ["Year", "Sentence"]
annualSentence.set_index("Year", inplace=True)
annualSentence = annualSentence.iloc[:-1,0]
annualSentence.plot()
plt.xticks(np.array(range(1995, 2017, 3)))
plt.ylabel("Sentence [Year]")
plt.title("Average National Sentencing")
This project uses the average national sentencing data spanning from 1995 to 2016 to build a model that can indicate sentencing statistics in the future. It seems that at the national level, sentences are getting shorter. However, it is difficult to justify building a model that is centered around the idea of a more lenient justice system in the long run because only a limited number of data points are available, making it impossibe to distinguish whether this trend is merely a short-term adjustment or a sustained effect. Thus, a normal distribution is adopted to sample future sentences.
Given data of annual population $p[t]$, admission $a[t]$ and release $r[t]$, we can model the yearly dynamics of prisoner population with a first degree difference stochastic process $$p[t] - p[t - 1] = a[t] - r[t] + \varepsilon p[t - 1], \varepsilon \sim N(0,\delta^2 )$$ where $\varepsilon p[t - 1]$ is the Gaussian error term and $\varepsilon$ indicates the error is proportional to the previous population. $\varepsilon$ should be zero-centered with a variance of $\delta^2$.
Every year it is assumed that any prisoner's remaining sentence to serve is also a random variable $s[t]$ which is normally distributed. The center of that distribution is year-dependent and denoted $\mu [t]$, while the variance $\sigma^2$ is assumed to be invariate. The annual release can be approximated with such distribution's cumulative distribution function $$r[t] = F(1;\mu [t - 1],{\sigma ^2})p[t - 1]$$ $F(1)$ represents the proportion of prisoners whose remaining sentence is less than 1 year.
A second degree difference equation is used to describe the temporal evolution of the combined remaining sentence of the whole prisoner body $$p[t]\mu [t] = p[t - 1](\mu [t - 1] - 1) + 0.5r[t] + a[t]y[t] - 0.5a[t]$$ $p[t - 1](\mu [t - 1] - 1)$ represents the previous year's population's average number of remaining years to serve reduces for by 1 year. However, because $r[t]$ prisoners are being released, 1 year of decrease is an over-estimation. On average, for each prisoner released, a half-year over-estimation is introduced. Thus, it is compensated by the $+0.5r[t]$ term given the prisoners are released evenly throughout the year. $y[t]$ is the average sentence of the admitted prisoner of that year. $a[t]y[t]$ represents the contribution of the newly admitted prisoners. Similarly, for each newly admitted prisoner, because the individual will have served some time by the end of the year, we introduced a half-year over-estimation, which is corrected by the $-0.5a[t]$ term.
def extractAreaPopulationCount(table, area, tableType="Population", year_begin=1978,year_end=2016):
# Extract a series of population, admission or release data for an area (state, region or division)
# area is a pandas index
data = table.loc[area]
data = data[np.logical_and(data.index <= year_end, data.index >= year_begin)]
data.name = tableType
data.index.name="Year"
return data
def propogateMu(past_mu,past_p, current_p, current_a,current_r, current_y):
# State propogation function derived from the equilibrium of population * mu
return (past_p * (past_mu - 1) + 0.5 * current_r + current_a * (current_y - 0.5)) / current_p
def solveMu(initialMu,initialPopulation, population, admission,release, anualSentence):
# Given the initial state of mu and populatin,
# this function can calculate the evolution of mu
# according to the revolution of population, admission, release, and average sentence
mu = population * 0
mu.name = "Mu"
for i, p in enumerate(population):
if i == 0:
past_p = initialPopulation
past_mu = initialMu
# propogate mu
mu.iloc[i] = propogateMu(past_mu, past_p, population.iloc[i], admission.iloc[i],release.iloc[i], anualSentence.iloc[i])
past_mu = mu.iloc[i]
past_p = population.iloc[i]
mu[1994] = initialMu
mu = mu.sort_index()
return mu
def estimateSigmaOfNormalCDF(x,F, mu):
# Given x and F -- CDF at x of a normal distribution with center of mu,
# this function estimate the square root of variance
# This function can take vetor input
# we have function F(x) as the CDF of N(mu, sigma^2)
# given function Phi(x) as the CDF of N(0, 1)
# F(x) = Phi((x - mu) / sigma)
# the reverse function of Phi is norm.ppf()
return (x - mu) / norm.ppf(F)
def estimateSigma(population, release, mu):
# Estimate sigma given population and release data.
# r[t] = F(1; mu, sigma^2) x p[t-1].
# CDF is the cumulative distribution of a normal distribution.
mu = mu.values[: -1]
p = population.values[: -1]
r = release.values
F = r / p
sigma = estimateSigmaOfNormalCDF(1,F, mu)
sigma = sigma[sigma >= 0]
return sigma.mean()
def sampleFutureAdmissionSentence(sampleMean,sampleSD, year_end, dataType):
# Given the mean and standard deviation,
# this function samples an array of admission or sentence from a normal distribution,
# ranging from 2017 to year_end
years = range(2017, year_end + 1)
simulated = normal(sampleMean,sampleSD, len(years))
simulated[simulated <= 0.] = sampleMean
if dataType == "Admission":
simulated = np.round(simulated)
simulated = pd.Series(simulated, index=years, name=dataType)
simulated.index.name = "Year"
return simulated
def propogateState(past_mu,past_p, current_a,current_y, epsilon_sd, sigma):
# State propogation function.
# This function estimates release as PDF of a normal distribution with mu as the center.
# The population was propogated as increment of difference between admission and release
# plus proportional error
if np.isnan(sigma):
current_r = (- past_p + current_a * (current_y - 0.5 + past_mu)) / (past_mu - 0.5)
# This dynamics tries to stablize mu
else:
current_r = np.round(norm(past_mu, sigma).cdf(1) * past_p)
# calculate population as the increment of difference between admission and release
current_p = np.round(past_p + current_a - current_r + normal(scale=epsilon_sd) * past_p)
# propogate mu
current_mu = propogateMu(past_mu,past_p, current_p, current_a,current_r, current_y)
return current_mu, current_p, current_r
def solvePopulation(initialMu,initialPopulation, futureAdmission,futureSentence, epsilon_sd, sigma):
# Given the initial mu (mu[2016]) and initial population (p[2016]),
# this function uses simulated future admission and sentence data
# to generate a projection of prision population
futurePopulation = futureAdmission * 0
futurePopulation.name = "Population"
futureRelease = futureAdmission * 0
futureRelease.name = "Release"
futureMu = futureSentence * 0
futureMu.name = "Mu"
# state ptopogation
for i, p in enumerate(futurePopulation):
if i == 0:
past_mu = initialMu
past_p = initialPopulation
futureMu.iloc[i], futurePopulation.iloc[i], futureRelease.iloc[i] = propogateState(past_mu,past_p, futureAdmission.iloc[i],futureSentence.iloc[i], epsilon_sd, sigma)
past_mu = futureMu.iloc[i]
past_p = futurePopulation.iloc[i]
return futureMu, futurePopulation, futureRelease
def modelAndProject_NSamples_OneArea(populationTable, admissionTable,releaseTable, area, annualSentence, initialMu, projectionYear, N):
# This function models the data and approximate major parameters for one jurisdiction:
# epsilon_sd -- standard deviation for the error term,
# sigma -- standard deviation for the distribution of remaining sentence for a random individual prisoner.
# These parameters are then used to sample and simulate projections
# Note: the population, admission and release data range -- 1978-2016
# the anual sentencing data range -- 1995-2016
# N is the number of samples to draw for prediction
# initialMu is the mu[1994]
# projectionYear is the end of projection
# Estimate error term of population increment
population = extractAreaPopulationCount(populationTable, area, "Population")
admission = extractAreaPopulationCount(admissionTable, area, "Admission")
release = extractAreaPopulationCount(releaseTable, area, "Release")
# data range 1978-2017
epsilon = (population.diff() - admission + release).values[1:] / population.values[:-1]
epsilon_sd = epsilon.std()
# Model state propogation
modernPopulation = population.loc[1995 :]
modernAdmission = admission.loc[1995 :]
modernRelease = release.loc[1995 :]
# data range 1995-2016
# estimate mu[t]
initialPopulation = population[1994] # use 1994 for initial population
mu = solveMu(initialMu,initialPopulation, modernPopulation, modernAdmission,modernRelease, annualSentence)
# solveMu already concates initial state of mu to the output
# estimate sigma
modernPopulation = population.loc[1994 :]
# include the initial state of population
sigma = estimateSigma(modernPopulation, modernRelease, mu)
# Project to from 2017 to projectionYear
# Model and sample N sets of future admission and sentence
admissionSD = modernAdmission.std()
futureAdmissions = [sampleFutureAdmissionSentence(modernAdmission.iloc[-1],admissionSD, 2050, "Admission") for i in range(N)]
sentenceSD = annualSentence.std()
futureSentences = [sampleFutureAdmissionSentence(annualSentence.iloc[-1],sentenceSD, 2050, "Sentence") for i in range(N)]
# Project future mu, population and release
futureMus, futurePopulations, futureReleases = zip(*map(solvePopulation, repeat(mu.iloc[-1]),repeat(modernPopulation.iloc[-1]), futureAdmissions,futureSentences, repeat(epsilon_sd), repeat(sigma)))
futurePopulations = pd.concat(futurePopulations, axis=1)
futureAdmissions = pd.concat(futureAdmissions, axis=1)
futureReleases = pd.concat(futureReleases, axis=1)
return futurePopulations, futureAdmissions,futureReleases
In order to make projection of the future prisoner population, unknown parameters and latent variables need to be estimated first. Given past anual survey of the prisoner population, admission and release, equation $p[t] - p[t - 1] = a[t] - r[t] + \varepsilon p[t - 1]$ can be used to estimate the variance of $\varepsilon$. Provided the average sentence of the newly admitted and the initial state of $\mu [t]$ ($\mu [1994]$ due to the lack of anual sentencing data prior to 1995), equation $p[t]\mu [t] = p[t - 1](\mu [t - 1] - 1) + 0.5r[t] + a[t]y[t] - 0.5a[t]$ can be used to popogate the state of the system and resolve $\mu [t]$. After that, $r[t] = F(1;\mu [t - 1],{\sigma ^2})p[t - 1]$ is used to approximate $\sigma$ for each corresponding year. The results are then averaged to generate a point estimation of $\sigma$.
To make forecast, first, a normal distribution is built based on past admission data to sample future admission. A similar distribution is also built for the past sentencing data to generate a sampler for future sentencing. Since the system state is known for up until 2016, equation $r[t] = F(1;\mu [t - 1],{\sigma ^2})p[t - 1]$ can be used to determine the next year's release while equation $p[t] - p[t - 1] = a[t] - r[t] + \varepsilon p[t - 1]$ can be used to sample for the following year's prisoner population. Finally, using equation $p[t]\mu [t] = p[t - 1](\mu [t - 1] - 1) + 0.5r[t] + a[t]y[t] - 0.5a[t]$ to propogate $\mu [t]$, one full iteration of state evolution is completed.
def collapseSamples(sampleTable, area):
# Calculate the mean and standar deviation of a group of projections for one jurisdiction
sampleMean = sampleTable.mean(axis=1)
sampleMean.name = area
sampleSD = sampleTable.std(axis=1)
sampleSD.name = area
return sampleMean, sampleSD
def modelAndProjectPopulation_NSamples(populationTable, admissionTable,releaseTable, annualSentence, initialMu, projectionYear, N):
# Models and makes forecast N projections for all the jurisdictions
# Return the mean and standard deviation of those N projections
areas = populationTable.index
futurePopulationMeans = []
futurePopulationSDs = []
futureAdmissionMeans = []
futureAdmissionSDs = []
futureReleaseMeans = []
futureReleaseSDs = []
for area in areas:
futurePopulations, futureAdmissions,futureReleases = modelAndProject_NSamples_OneArea(populationTable, admissionTable,releaseTable, area, annualSentence, initialMu, projectionYear, N)
futurePopulationMean, futurePopulationSD = collapseSamples(futurePopulations, area)
futurePopulationMeans.append(futurePopulationMean)
futurePopulationSDs.append(futurePopulationSD)
futureAdmissionMean, futureAdmissionSD = collapseSamples(futureAdmissions, area)
futureAdmissionMeans.append(futureAdmissionMean)
futureAdmissionSDs.append(futureAdmissionSD)
futureReleaseMean, futureReleaseSD = collapseSamples(futureReleases, area)
futureReleaseMeans.append(futureReleaseMean)
futureReleaseSDs.append(futureReleaseSD)
futurePopulationMeans = pd.concat(futurePopulationMeans, axis=1).transpose()
futureAdmissionMeans = pd.concat(futureAdmissionMeans, axis=1).transpose()
futureReleaseMeans = pd.concat(futureReleaseMeans, axis=1).transpose()
futurePopulationSDs = pd.concat(futurePopulationSDs, axis=1).transpose()
futureAdmissionSDs = pd.concat(futureAdmissionSDs, axis=1).transpose()
futureReleaseSDs = pd.concat(futureReleaseSDs, axis=1).transpose()
return futurePopulationMeans,futurePopulationSDs, futureAdmissionMeans,futureAdmissionSDs, futureReleaseMeans,futureReleaseSDs
maleFuturePopulationMean_5,maleFuturePopulationSD_5, maleFutureAdmissionMean_5,maleFutureAdmissionSD_5, maleFutureReleaseMean_5,maleFutureReleaseSD_5 = modelAndProjectPopulation_NSamples(malePrisonerPopulation, maleAdmission,maleRelease, annualSentence, 5, 2050, 500)
maleFuturePopulationMean_10,maleFuturePopulationSD_10, maleFutureAdmissionMean_10,maleFutureAdmissionSD_10, maleFutureReleaseMean_10,maleFutureReleaseSD_10 = modelAndProjectPopulation_NSamples(malePrisonerPopulation, maleAdmission,maleRelease, annualSentence, 10, 2050, 500)
maleFuturePopulationMean_20,maleFuturePopulationSD_20, maleFutureAdmissionMean_20,maleFutureAdmissionSD_20, maleFutureReleaseMean_20,maleFutureReleaseSD_20 = modelAndProjectPopulation_NSamples(malePrisonerPopulation, maleAdmission,maleRelease, annualSentence, 20, 2050, 500)
maleFuturePopulationMean_40,maleFuturePopulationSD_40, maleFutureAdmissionMean_40,maleFutureAdmissionSD_40, maleFutureReleaseMean_40,maleFutureReleaseSD_40 = modelAndProjectPopulation_NSamples(malePrisonerPopulation, maleAdmission,maleRelease, annualSentence, 40, 2050, 500)
femaleFuturePopulationMean_5,femaleFuturePopulationSD_5, femaleFutureAdmissionMean_5,femaleFutureAdmissionSD_5, femaleFutureReleaseMean_5,femaleFutureReleaseSD_5 = modelAndProjectPopulation_NSamples(femalePrisonerPopulation, femaleAdmission,femaleRelease, annualSentence, 5, 2050, 500)
femaleFuturePopulationMean_10,femaleFuturePopulationSD_10, femaleFutureAdmissionMean_10,femaleFutureAdmissionSD_10, femaleFutureReleaseMean_10,femaleFutureReleaseSD_10 = modelAndProjectPopulation_NSamples(femalePrisonerPopulation, femaleAdmission,femaleRelease, annualSentence, 10, 2050, 500)
femaleFuturePopulationMean_20,femaleFuturePopulationSD_20, femaleFutureAdmissionMean_20,femaleFutureAdmissionSD_20, femaleFutureReleaseMean_20,femaleFutureReleaseSD_20 = modelAndProjectPopulation_NSamples(femalePrisonerPopulation, femaleAdmission,femaleRelease, annualSentence, 20, 2050, 500)
femaleFuturePopulationMean_40,femaleFuturePopulationSD_40, femaleFutureAdmissionMean_40,femaleFutureAdmissionSD_40, femaleFutureReleaseMean_40,femaleFutureReleaseSD_40 = modelAndProjectPopulation_NSamples(femalePrisonerPopulation, femaleAdmission,femaleRelease, annualSentence, 40, 2050, 500)
maleFuturePopulationRegionSumMean_5,maleFuturePopulationRegionSumSD_5, maleFutureAdmissionRegionSumMean_5,maleFutureAdmissionRegionSumSD_5, maleFutureReleaseRegionSumMean_5,maleFutureReleaseRegionSumSD_5 = modelAndProjectPopulation_NSamples(malePrisonerPopulationRegionSum, maleAdmissionRegionSum,maleReleaseRegionSum, annualSentence, 5, 2050, 500)
maleFuturePopulationRegionSumMean_10,maleFuturePopulationRegionSumSD_10, maleFutureAdmissionRegionSumMean_10,maleFutureAdmissionRegionSumSD_10, maleFutureReleaseRegionSumMean_10,maleFutureReleaseRegionSumSD_10 = modelAndProjectPopulation_NSamples(malePrisonerPopulationRegionSum, maleAdmissionRegionSum,maleReleaseRegionSum, annualSentence, 10, 2050, 500)
maleFuturePopulationRegionSumMean_20,maleFuturePopulationRegionSumSD_20, maleFutureAdmissionRegionSumMean_20,maleFutureAdmissionRegionSumSD_20, maleFutureReleaseRegionSumMean_20,maleFutureReleaseRegionSumSD_20 = modelAndProjectPopulation_NSamples(malePrisonerPopulationRegionSum, maleAdmissionRegionSum,maleReleaseRegionSum, annualSentence, 20, 2050, 500)
maleFuturePopulationRegionSumMean_40,maleFuturePopulationRegionSumSD_40, maleFutureAdmissionRegionSumMean_40,maleFutureAdmissionRegionSumSD_40, maleFutureReleaseRegionSumMean_40,maleFutureReleaseRegionSumSD_40 = modelAndProjectPopulation_NSamples(malePrisonerPopulationRegionSum, maleAdmissionRegionSum,maleReleaseRegionSum, annualSentence, 40, 2050, 500)
femaleFuturePopulationRegionSumMean_5,femaleFuturePopulationRegionSumSD_5, femaleFutureAdmissionRegionSumMean_5,femaleFutureAdmissionRegionSumSD_5, femaleFutureReleaseRegionSumMean_5,femaleFutureReleaseRegionSumSD_5 = modelAndProjectPopulation_NSamples(femalePrisonerPopulationRegionSum, femaleAdmissionRegionSum,femaleReleaseRegionSum, annualSentence, 5, 2050, 500)
femaleFuturePopulationRegionSumMean_10,femaleFuturePopulationRegionSumSD_10, femaleFutureAdmissionRegionSumMean_10,femaleFutureAdmissionRegionSumSD_10, femaleFutureReleaseRegionSumMean_10,femaleFutureReleaseRegionSumSD_10 = modelAndProjectPopulation_NSamples(femalePrisonerPopulationRegionSum, femaleAdmissionRegionSum,femaleReleaseRegionSum, annualSentence, 10, 2050, 500)
femaleFuturePopulationRegionSumMean_20,femaleFuturePopulationRegionSumSD_20, femaleFutureAdmissionRegionSumMean_20,femaleFutureAdmissionRegionSumSD_20, femaleFutureReleaseRegionSumMean_20,femaleFutureReleaseRegionSumSD_20 = modelAndProjectPopulation_NSamples(femalePrisonerPopulationRegionSum, femaleAdmissionRegionSum,femaleReleaseRegionSum, annualSentence, 20, 2050, 500)
femaleFuturePopulationRegionSumMean_40,femaleFuturePopulationRegionSumSD_40, femaleFutureAdmissionRegionSumMean_40,femaleFutureAdmissionRegionSumSD_40, femaleFutureReleaseRegionSumMean_40,femaleFutureReleaseRegionSumSD_40 = modelAndProjectPopulation_NSamples(femalePrisonerPopulationRegionSum, femaleAdmissionRegionSum,femaleReleaseRegionSum, annualSentence, 40, 2050, 500)
maleFuturePopulationDivisionSumMean_5,maleFuturePopulationDivisionSumSD_5, maleFutureAdmissionDivisionSumMean_5,maleFutureAdmissionDivisionSumSD_5, maleFutureReleaseDivisionSumMean_5,maleFutureReleaseDivisionSumSD_5 = modelAndProjectPopulation_NSamples(malePrisonerPopulationDivisionSum, maleAdmissionDivisionSum,maleReleaseDivisionSum, annualSentence, 5, 2050, 500)
maleFuturePopulationDivisionSumMean_10,maleFuturePopulationDivisionSumSD_10, maleFutureAdmissionDivisionSumMean_10,maleFutureAdmissionDivisionSumSD_10, maleFutureReleaseDivisionSumMean_10,maleFutureReleaseDivisionSumSD_10 = modelAndProjectPopulation_NSamples(malePrisonerPopulationDivisionSum, maleAdmissionDivisionSum,maleReleaseDivisionSum, annualSentence, 10, 2050, 500)
maleFuturePopulationDivisionSumMean_20,maleFuturePopulationDivisionSumSD_20, maleFutureAdmissionDivisionSumMean_20,maleFutureAdmissionDivisionSumSD_20, maleFutureReleaseDivisionSumMean_20,maleFutureReleaseDivisionSumSD_20 = modelAndProjectPopulation_NSamples(malePrisonerPopulationDivisionSum, maleAdmissionDivisionSum,maleReleaseDivisionSum, annualSentence, 20, 2050, 500)
maleFuturePopulationDivisionSumMean_40,maleFuturePopulationDivisionSumSD_40, maleFutureAdmissionDivisionSumMean_40,maleFutureAdmissionDivisionSumSD_40, maleFutureReleaseDivisionSumMean_40,maleFutureReleaseDivisionSumSD_40 = modelAndProjectPopulation_NSamples(malePrisonerPopulationDivisionSum, maleAdmissionDivisionSum,maleReleaseDivisionSum, annualSentence, 40, 2050, 500)
femaleFuturePopulationDivisionSumMean_5,femaleFuturePopulationDivisionSumSD_5, femaleFutureAdmissionDivisionSumMean_5,femaleFutureAdmissionDivisionSumSD_5, femaleFutureReleaseDivisionSumMean_5,femaleFutureReleaseDivisionSumSD_5 = modelAndProjectPopulation_NSamples(femalePrisonerPopulationDivisionSum, femaleAdmissionDivisionSum,femaleReleaseDivisionSum, annualSentence, 5, 2050, 500)
femaleFuturePopulationDivisionSumMean_10,femaleFuturePopulationDivisionSumSD_10, femaleFutureAdmissionDivisionSumMean_10,femaleFutureAdmissionDivisionSumSD_10, femaleFutureReleaseDivisionSumMean_10,femaleFutureReleaseDivisionSumSD_10 = modelAndProjectPopulation_NSamples(femalePrisonerPopulationDivisionSum, femaleAdmissionDivisionSum,femaleReleaseDivisionSum, annualSentence, 10, 2050, 500)
femaleFuturePopulationDivisionSumMean_20,femaleFuturePopulationDivisionSumSD_20, femaleFutureAdmissionDivisionSumMean_20,femaleFutureAdmissionDivisionSumSD_20, femaleFutureReleaseDivisionSumMean_20,femaleFutureReleaseDivisionSumSD_20 = modelAndProjectPopulation_NSamples(femalePrisonerPopulationDivisionSum, femaleAdmissionDivisionSum,femaleReleaseDivisionSum, annualSentence, 20, 2050, 500)
femaleFuturePopulationDivisionSumMean_40,femaleFuturePopulationDivisionSumSD_40, femaleFutureAdmissionDivisionSumMean_40,femaleFutureAdmissionDivisionSumSD_40, femaleFutureReleaseDivisionSumMean_40,femaleFutureReleaseDivisionSumSD_40 = modelAndProjectPopulation_NSamples(femalePrisonerPopulationDivisionSum, femaleAdmissionDivisionSum,femaleReleaseDivisionSum, annualSentence, 40, 2050, 500)
def plotFederalProjection_GenderPopulation(population,futurePopulation,futurePopulation_error, admission,futureAdmission,futureAdmission_error, release,futureRelease,futureRelease_error, sex,initialMu,maxPopulation):
pd.concat([population, futurePopulation]).plot(color="darkorange", label="population")
plt.fill_between(futurePopulation.index, futurePopulation - futurePopulation_error, futurePopulation + futurePopulation_error, color="darkorange", alpha=0.25)
pd.concat([admission, futureAdmission]).plot(color="crimson", label="admission")
plt.fill_between(futureAdmission.index, futureAdmission - futurePopulation_error, futureAdmission + futureAdmission_error, color="crimson", alpha=0.25)
pd.concat([release, futureRelease]).plot(color="steelblue", label="release")
plt.fill_between(futureRelease.index, futureRelease - futureRelease_error, futureRelease + futureRelease_error, color="steelblue", alpha=0.25)
plt.axvline(1995, color="grey")
plt.legend(loc="upper left")
plt.ylim([0, maxPopulation])
plt.ylabel("Head Count")
plt.title("Projection of Federal " + sex + " Prisoner Population, mu[1994]=" + str(initialMu))
plt.show()
def plotRegionalProjection_GenderPopulation(population,futurePopulation,futurePopulation_error, sex,initialMu,maxPopulation):
population = population.sort_index().drop("Federal").transpose()
futurePopulation = futurePopulation.sort_index().drop("Federal").transpose()
futurePopulation_error = futurePopulation_error.drop("Federal").sort_index().transpose()
pd.concat([population, futurePopulation]).plot()
plt.gca().set_prop_cycle(None)
for i in range(population.shape[1]):
plt.fill_between(futurePopulation.index, futurePopulation.iloc[:, i] - futurePopulation_error.iloc[:, i], futurePopulation.iloc[:, i] + futurePopulation_error.iloc[:, i], alpha=0.25)
plt.axvline(1995, color="grey")
plt.legend(loc="upper left")
plt.ylim([0, maxPopulation])
plt.ylabel("Head Count")
plt.title("Projection of Regional " + sex + " Prisoner Population, mu[1994]=" + str(initialMu))
plt.show()
def alignDivisionTable(table, divisionTable, regions):
regions = regions.set_index("State")
divisionTable = divisionTable.drop("Federal")
divisionTable = divisionTable.sort_index()
table = regions.join(table).reset_index().set_index(["Region","Division", "State"]).sort_index()
return table, divisionTable
def plotDivisionalPopulationProjection_GenderPopulation(population,futurePopulation,futurePopulation_error, divisionalPopulation,futureDivisionalPopulation,futureDivisionalPopulation_error, sex,initialMu):
population, divisionalPopulation = alignDivisionTable(population, divisionalPopulation, regions)
futurePopulation, futureDivisionalPopulation = alignDivisionTable(futurePopulation, futureDivisionalPopulation, regions)
futurePopulation_error, futureDivisionalPopulation_error = alignDivisionTable(futurePopulation_error, futureDivisionalPopulation_error, regions)
regionList = population.index.get_level_values(0).unique()
for region in regionList:
divisionalPopulation_1Region = divisionalPopulation.loc[region]
futureDivisionalPopulation_1Region = futureDivisionalPopulation.loc[region]
futureDivisionalPopulation_error_1Region = futureDivisionalPopulation_error.loc[region]
plt.figure()
pd.concat([divisionalPopulation_1Region, futureDivisionalPopulation_1Region], axis=1).transpose().plot()
plt.legend(loc="upper left")
plt.gca().set_prop_cycle(None)
for index, row in futureDivisionalPopulation_1Region.iterrows():
plt.fill_between(row.index.values, row - futureDivisionalPopulation_error_1Region.loc[index], row + futureDivisionalPopulation_error_1Region.loc[index], alpha=0.25)
plt.axvline(1995, color="grey")
plt.ylabel("Population")
plt.title(region + " " + sex + " Population, mu[1994]=" + str(initialMu))
population_1Region = population.loc[region]
futurePopulation_1Region = futurePopulation.loc[region]
futurePopulation_error_1Region = futurePopulation_error.loc[region]
divisions = population_1Region.index.get_level_values(0).unique()
for division in divisions:
population_1Division = population_1Region.loc[division]
futurePopulation_1Division = futurePopulation_1Region.loc[division]
futurePopulation_error_1Division = futurePopulation_error_1Region.loc[division]
plt.figure()
pd.concat([population_1Division, futurePopulation_1Division], axis=1).transpose().plot()
plt.legend(loc="upper left")
plt.gca().set_prop_cycle(None)
for index, row in futurePopulation_1Division.iterrows():
plt.fill_between(row.index.values.astype(int), row - futurePopulation_error_1Division.loc[index], row + futurePopulation_error_1Division.loc[index], alpha=0.25)
plt.ylim(bottom=0)
plt.axvline(1995, color="grey")
plt.ylabel("Population")
plt.title(region + " -- " + division + " " + sex + " Population, mu[1994]=" + str(initialMu))
Here, an initial state of $\mu[1994] = 20$ is used to solve the latent variables and to estimate the parameters for both the male and female facilities, following which 250 simulations are generated and averaged to build a projection with standard deviaion plotted as shaded areas.
plotFederalProjection_GenderPopulation(malePrisonerPopulation.loc["Federal"], maleFuturePopulationMean_20.loc["Federal"],maleFuturePopulationSD_20.loc["Federal"],maleAdmission.loc["Federal"], maleFutureAdmissionMean_20.loc["Federal"],maleFutureAdmissionSD_20.loc["Federal"], maleRelease.loc["Federal"], maleFutureReleaseMean_20.loc["Federal"],maleFutureReleaseSD_20.loc["Federal"], "Male", 20, 210000)
plotRegionalProjection_GenderPopulation(malePrisonerPopulationRegionSum, maleFuturePopulationRegionSumMean_20,maleFuturePopulationRegionSumSD_20, "Male", 20, 800000)
plotDivisionalPopulationProjection_GenderPopulation(malePrisonerPopulation,maleFuturePopulationMean_20,maleFuturePopulationSD_20, malePrisonerPopulationDivisionSum,maleFuturePopulationDivisionSumMean_20,maleFuturePopulationDivisionSumSD_20, "Male",20)
The projection of federal prisoner population predicts that the male population will reach a steady state soon following the current decrease that started in mid-2010s. Among the four regions, the prisoner population of all regions but the South has entered a stable period.
In the East North Central division of the Midwest, we can expect a continuous decrease in prisoner population before it starts to stabilize in early 2020s. This decline is mostly due to the decrease of prisoner population in Illinois.
In the Northeast, the current contraction of Mid-Atlantic prisoner population will terminate by 2020, entering a slow growth which is mostly driven by a constant increase of Pennsylvania’s prisoner population.
In the south, both South Atlantic and West South Central can see a slight but continuous growth of prisoner population. In the East South Central division, Kentucky will experience a surge of growth which then transformed into a period of slow but steady development. In the West South Central, Texas continues to lead a sustained increase of prisoner population. In the south Atlantic, the current shrinkage of Florida’s prisoner body will soon enter a period very moderate but elongated expansion.
In the West, both Mountain and Pacific divisions will witness continuous but stagnant increments in their prisoner population. The increase of prisoner population in California will started to dwindle by mid-2020s, while we may expect a volatile upsurge in Washington before it is substituted with a much milder inflation.
plotFederalProjection_GenderPopulation(femalePrisonerPopulation.loc["Federal"], femaleFuturePopulationMean_20.loc["Federal"],femaleFuturePopulationSD_20.loc["Federal"],femaleAdmission.loc["Federal"], femaleFutureAdmissionMean_20.loc["Federal"],femaleFutureAdmissionSD_20.loc["Federal"], femaleRelease.loc["Federal"], femaleFutureReleaseMean_20.loc["Federal"],femaleFutureReleaseSD_20.loc["Federal"], "Female", 20, 18000)
plotRegionalProjection_GenderPopulation(femalePrisonerPopulationRegionSum, femaleFuturePopulationRegionSumMean_20,femaleFuturePopulationRegionSumSD_20, "Female", 20, 55000)
plotDivisionalPopulationProjection_GenderPopulation(femalePrisonerPopulation,femaleFuturePopulationMean_20,femaleFuturePopulationSD_20, femalePrisonerPopulationDivisionSum,femaleFuturePopulationDivisionSumMean_20,femaleFuturePopulationDivisionSumSD_20, "Female",20)
Contradicted to the steady projected growth in the federal female prisoner population, all four regions can expect a sustained decrease in their female prisoner population. Although in most states, the female prisoner population will either maintain the current level, if not very slight increase, or experience a developing shrinkage. Pennsylvania, Alabama, North Carolina, West Virginia, Delaware, Texas, Nevada, New Mexico and Montana can expect to see a significant increase in their female prisoner population.
def modelAndProjectCapacity_NSamples(populationTable, admissionTable,releaseTable, annualSentence, capacity, initialMu, projectionYear, N):
areas = populationTable.index
futureOccupancies = {}
futureOccupancyMean = []
futureOccupancySD = []
for area in areas:
if area in capacity.index:
futurePopulation, _,_ = modelAndProject_NSamples_OneArea(populationTable, admissionTable,releaseTable, area, annualSentence, initialMu, projectionYear, N)
# futurePopulation.sort_index(inplace=True)
futureOccupancy = futurePopulation / capacity[area]
futureOccupancies[area] = futureOccupancy
occupancyMean = futureOccupancy.mean(axis=1)
occupancyMean.index.name = "Year"
occupancyMean.name = area
occupancySD = futureOccupancy.std(axis=1)
occupancySD.index.name = "Year"
occupancySD.name = area
futureOccupancyMean.append(occupancyMean)
futureOccupancySD.append(occupancySD)
futureOccupancyMean = pd.concat(futureOccupancyMean, axis=1).transpose()
futureOccupancyMean.index.name = "Jurisdiction"
futureOccupancySD = pd.concat(futureOccupancySD, axis=1).transpose()
futureOccupancySD.index.name = "Jurisdiction"
return futureOccupancies, futureOccupancyMean,futureOccupancySD
futureOccupancies_5, futureOccupancyMean_5,futureOccupancySD_5 = modelAndProjectCapacity_NSamples(totalPrisonerPopulation, totalAdmission,totalRelease, annualSentence, capacity, 5, 2050, 500)
futureOccupancies_10, futureOccupancyMean_10,futureOccupancySD_10 = modelAndProjectCapacity_NSamples(totalPrisonerPopulation, totalAdmission,totalRelease, annualSentence, capacity, 10, 2050, 500)
futureOccupancies_20, futureOccupancyMean_20,futureOccupancySD_20 = modelAndProjectCapacity_NSamples(totalPrisonerPopulation, totalAdmission,totalRelease, annualSentence, capacity, 20, 2050, 500)
futureOccupancies_40, futureOccupancyMean_40,futureOccupancySD_40 = modelAndProjectCapacity_NSamples(totalPrisonerPopulation, totalAdmission,totalRelease, annualSentence, capacity, 40, 2050, 500)
def orderCapacityByRegion(table, regions):
table = futureOccupancyMean_5
regions = regions.set_index("State")
table = pd.concat((regions, table), axis=1)
table.index.name = "State"
table.loc["Federal","Region"] = "Federal"
table.loc["Federal","Division"] = "Federal"
table = table.dropna(axis=0)
table = table.reset_index().set_index(["Region", "Division", "State"]).sort_index()
table.columns.name = "Year"
table.columns = table.columns.astype(int)
return table
def plotOccupancyProjection(occupancy, occupancy_error, initialMu):
occupancy = orderCapacityByRegion(occupancy, regions)
occupancy_error = orderCapacityByRegion(occupancy_error, regions)
for region in occupancy.index.get_level_values(0).unique():
for division in occupancy.loc[region].index.get_level_values(0).unique():
occupancy_1Division = occupancy.loc[region, division]
occupancy_error_1Division = occupancy_error.loc[region, division]
plt.figure()
occupancy_1Division.transpose().plot()
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.15), fancybox=True, shadow=True, ncol=occupancy_1Division.shape[0])
if region != "Federal":
plt.title(region + " -- " + division + " Occupancy Projection, mu[1994]=" + str(initialMu))
else:
plt.title("Federal Occupancy Projection, mu[1994]=" + str(initialMu))
plt.ylabel("Occupancy")
def checkOccupancyThresholdCrossing(occupancies, upThreshold,downThreshold, initialMu):
crossingupYear = {}
crossingdownYear = {}
for area in occupancies:
occupancy = occupancies[area]
# Use one-sample t-test to discover threshold-crossing timing
t, p_up = ttest_1samp(occupancy.values.transpose(), upThreshold)
p_up = p_up/2
logical_index = np.logical_and(p_up < 0.05, occupancy.mean(axis=1) > upThreshold)
if logical_index.sum() > 0:
crossingupYear[area] = occupancy.index[logical_index].min()
t, p_down = ttest_1samp(occupancy.values.transpose(), downThreshold)
p_up = p_up/2
logical_index = np.logical_and(p_up < 0.05, occupancy.mean(axis=1) < downThreshold)
if logical_index.sum() > 0:
crossingdownYear[area] = occupancy.index[logical_index].min()
crossingupYear = pd.Series(crossingupYear).sort_values()
crossingdownYear = pd.Series(crossingdownYear).sort_values()
print("mu[1994] = " + str(initialMu))
print("--------------------")
if not crossingupYear.empty:
print("Year when occupancy rises above " + str(upThreshold) + ":")
print(crossingupYear)
print("--------------------")
if not crossingupYear.empty:
print("Year when occupancy drops below " + str(downThreshold) + ":")
print(crossingdownYear)
return crossingupYear, crossingdownYear
The simulated population projections are divided by the capacity of the prison system to reflect a forecast of facility occupancy if no new prison is to be built in the near future. The capacity data used here are the newest available, surveyed in 2016. Again, 250 simulations have been generated, and the mean is plotted in the following figures. The group of projections are then used determine the timing when the occupancy rate will cross a certain threshold through a one-sample ttest with significance level set as 0.05.
plotOccupancyProjection(futureOccupancyMean_20,futureOccupancySD_20, 20)
The federal prison occupancy will reach its peak level just shy of 150% by early 2020s, before entering a decrease. While most states will suffer a moderate increase of prison occupancy and be forced to function at a relatively over-crowded capacity if no new facility is to be built, the prison occupancy rate of Vermont will rise very drastically to above 500% by 2025. On the other hand, we can expect the occupancy in Illinois, Minnesota and North Dakota to reduce significantly to the level below 100%.
crossingupYear, crossingdownYear = checkOccupancyThresholdCrossing(futureOccupancies_20, 1.5,1, 20)
The analysis indicates that the occupancy rate of prisons in Iowa, South Dakota, Virginia and Oklahoma will rise over 150% within the next two decades while the prison systems in 8 other states have already been too full.
On the other hand, the analysis identifies 16 states that do not currently suffer from an over-crowded prison system. If the current trend continues, even without building new facilities, the occupancy rates in Minnesota, North Dakota and Illinois will fall below 100% in the near future.
It is recommended that new prisons be built in Kentucky, Lousiana, Montana, Tennessee, Vermont, Washington, Deleware, Idaho, Iowa and South Dakota as the need there is very imminent. More engagement with the justuce system of Virginia and Oklahoma is also encouraged in the future as the prison occupancy rate there are still rising, and the facilities will become over-crowded by late 2030s.
In Alaska, South Carolina, Rhode Island, Oregon, New Jersey, Nevada, Mississippi, Utah, Maryland, Maine, Indiana, Georgia, Florida, Arizona, Massachusetts and Minnesota, there seems to be no market for the private prison industry to expand. As the prison occupancy rate in North Dokota will drop below 100% in early 2020s, future investment plan in that state should hold off. Although the market for private prisons is still existent in Illinois for now, investors should expect significant contranction by early 2030s.
In this project, the national sentencing data is used for modelling individual jurisdictions. Ideally, each jurisdiction should adopt its own data because the national data does not account for the variance among the states. Using national data may create more problems especially for jurisdictions with relatively smaller prison populaiton and admission for their sentencing statistics should be a lot less stable than larger populations'. However, only the national data have been made readily available by the U.S. Sentencing Comission.
totalFuturePopulationMean_5,totalFuturePopulationSD_5, totalFutureAdmissionMean_5,totalFutureAdmissionSD_5, totalFutureReleaseMean_5,totalFutureReleaseSD_5 = modelAndProjectPopulation_NSamples(totalPrisonerPopulation, totalAdmission,totalRelease, annualSentence, 5, 2050, 250)
totalFuturePopulationMean_10,totalFuturePopulationSD_10, totalFutureAdmissionMean_10,totalFutureAdmissionSD_10, totalFutureReleaseMean_10,totalFutureReleaseSD_10 = modelAndProjectPopulation_NSamples(totalPrisonerPopulation, totalAdmission,totalRelease, annualSentence, 10, 2050, 250)
totalFuturePopulationMean_20,totalFuturePopulationSD_20, totalFutureAdmissionMean_20,totalFutureAdmissionSD_20, totalFutureReleaseMean_20,totalFutureReleaseSD_20 = modelAndProjectPopulation_NSamples(totalPrisonerPopulation, totalAdmission,totalRelease, annualSentence, 20, 2050, 250)
totalFuturePopulationMean_40,totalFuturePopulationSD_40, totalFutureAdmissionMean_40,totalFutureAdmissionSD_40, totalFutureReleaseMean_40,totalFutureReleaseSD_40 = modelAndProjectPopulation_NSamples(totalPrisonerPopulation, totalAdmission,totalRelease, annualSentence, 40, 2050, 250)
def plotFederalProjection_Population(population, futurePopulations,initialMus, maxPopulation):
population.plot(color="grey", label="")
for i, futurePopulation in enumerate(futurePopulations):
futurePopulation[2016] = population[2016]
futurePopulation.sort_index(inplace=True)
futurePopulation.plot(label = initialMus[i])
plt.axvline(1995, color="grey")
plt.legend(loc="upper left", title="mu[1994]")
plt.ylim([0, maxPopulation])
plt.ylabel("Head Count")
plt.title("Projection of Federal Prisoner Population, varying mu[1994]")
plt.show()
totalFederalPopulationProjections = [totalFuturePopulationMean_5.loc["Federal"], totalFuturePopulationMean_10.loc["Federal"], totalFuturePopulationMean_20.loc["Federal"], totalFuturePopulationMean_40.loc["Federal"]]
plotFederalProjection_Population(totalPrisonerPopulation.loc["Federal"], totalFederalPopulationProjections,[5, 10, 20, 40], 225000)
Because no data is available for establishing a reasonable guess for the initial state of $\mu [t]$, several values have been tested ($\mu [1994] = 5,10,20,40)$. Eventually, 20 seems to be the best guess for several reasons.
First of all, 20 seems to be a rational value due to practical constraints. As prisoners with short sentences get released sooner, the average remaining sentence of the prisoner population body is unlikely to be a small number, such as 5. Because human lifespan is limited, the center of the distribution should not be a very large number, such as 40, either.
Secondly, the simulation results suggest that, as shown in the figure above, the transition between existing data points and projection is rendered more naturally when $\mu [1994] = 20$. Although from a long-term perpective trajectories with $\mu [1994]$ less than 20 seem to converge very closely to the $\mu [1994] = 20$ trajectory, their transition from data to prediction seems very coerced. This is because in the modelling process, if the center of the distribution is under-estimated, but the cummulative distribution function has to satisfy $r [t] = F(1;\mu [t - 1],{\sigma ^2})p[t - 1]$, the algorithm will have to under-estimate the variance as well. The under-estimated variance can cause the algorithm to the release number in the projection stage, thus generating a surge in popultion.
Future work should focus on collecting more data which would help build a more realistic model.
More research is needed to generate a reasonable guess of $\mu [1994]$ which can be backed by evidence. Currently, the simulation is based on many assumptions of normal distributions, especially the one that describes the remaining sentence of individual prisoners. If more survey is conducted, it might become possible to build a more faithful model for the distribution of remaining sentence, which can lead to more realistic projections of the release number.
It is also necessary to collect anual sentencing data of the newly admitted prisoners in each state. The reason is twofold. First, state-specific data provides more accuracy. Secondly, currently the lack of sufficient data makes it impossible to generalize a tendency of more lenient sentencing. If data from most states all indicate a similar pattern, it can support a model that predicts a gradually more lenient sentencing.